Increasing Sequence Search Sensitivity with Transitive Alignments
نویسندگان
چکیده
Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database.
منابع مشابه
Serial BLAST searching
MOTIVATION The translating BLAST algorithms are powerful tools for finding protein-coding genes because they identify amino acid similarities in nucleotide sequences. Unfortunately, these kinds of searches are computationally intensive and often represent bottlenecks in sequence analysis pipelines. Tuning parameters for speed can make the searches much faster, but one risks losing low-scoring a...
متن کاملHHsenser: exhaustive transitive profile search using HMM–HMM comparison
HHsenser is the first server to offer exhaustive intermediate profile searches, which it combines with pairwise comparison of hidden Markov models. Starting from a single protein sequence or a multiple alignment, it can iteratively explore whole superfamilies, producing few or no false positives. The output is a multiple alignment of all detected homologs. HHsenser's sensitivity should make it ...
متن کاملIndexing and Retrieval for Genomic Databases
Genomic sequence databases are widely used by molecular biologists for homology searching Amino acid and nucleotide databases are increasing in size exponentially and mean sequence lengths are also increasing In searching such databases it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences only and to reduce the costs of the alignments tha...
متن کاملQuery-seeded iterative sequence similarity searching improves selectivity 5–20-fold
Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has bee...
متن کاملMSARI: multiple sequence alignments for statistical detection of RNA secondary structure.
We present a highly accurate method for identifying genes with conserved RNA secondary structure by searching multiple sequence alignments of a large set of candidate orthologs for correlated arrangements of reverse-complementary regions. This approach is growing increasingly feasible as the genomes of ever more organisms are sequenced. A program called msari implements this method and is signi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2013